Carga y limpieza preliminar de los datos

Los datos que se van a analizar en este documento proceden de la compilación hecha por usuarios de Kaggle. La fecha del análisis empieza el 22 de agosto de 2022, utilizando la versión 166 recopilada en la web anterior.

Cargar el dataset correctamente

Carga del dataset desde Python

import pandas as pd
datos = pd.read_csv("data/covid_19_clean_complete.csv")
datos.head(10)
##                  Province/State  ...             WHO Region
## 0                           NaN  ...  Eastern Mediterranean
## 1                           NaN  ...                 Europe
## 2                           NaN  ...                 Africa
## 3                           NaN  ...                 Europe
## 4                           NaN  ...                 Africa
## 5                           NaN  ...               Americas
## 6                           NaN  ...               Americas
## 7                           NaN  ...                 Europe
## 8  Australian Capital Territory  ...        Western Pacific
## 9               New South Wales  ...        Western Pacific
## 
## [10 rows x 10 columns]

Carga del dataset con la librería reticulate

pd <- import("pandas")
datos <- pd$read_csv("data/covid_19_clean_complete.csv")
kable(head(datos, 10))
Province/State Country/Region Lat Long Date Confirmed Deaths Recovered Active WHO Region
NaN Afghanistan 33.93911 67.70995 2020-01-22 0 0 0 0 Eastern Mediterranean
NaN Albania 41.15330 20.16830 2020-01-22 0 0 0 0 Europe
NaN Algeria 28.03390 1.65960 2020-01-22 0 0 0 0 Africa
NaN Andorra 42.50630 1.52180 2020-01-22 0 0 0 0 Europe
NaN Angola -11.20270 17.87390 2020-01-22 0 0 0 0 Africa
NaN Antigua and Barbuda 17.06080 -61.79640 2020-01-22 0 0 0 0 Americas
NaN Argentina -38.41610 -63.61670 2020-01-22 0 0 0 0 Americas
NaN Armenia 40.06910 45.03820 2020-01-22 0 0 0 0 Europe
Australian Capital Territory Australia -35.47350 149.01240 2020-01-22 0 0 0 0 Western Pacific
New South Wales Australia -33.86880 151.20930 2020-01-22 0 0 0 0 Western Pacific

Carga del dataset desde R

datos <- read.csv("data/covid_19_clean_complete.csv", stringsAsFactors = T)
datos %>% head(10) %>% kable()
Province.State Country.Region Lat Long Date Confirmed Deaths Recovered Active WHO.Region
Afghanistan 33.93911 67.70995 2020-01-22 0 0 0 0 Eastern Mediterranean
Albania 41.15330 20.16830 2020-01-22 0 0 0 0 Europe
Algeria 28.03390 1.65960 2020-01-22 0 0 0 0 Africa
Andorra 42.50630 1.52180 2020-01-22 0 0 0 0 Europe
Angola -11.20270 17.87390 2020-01-22 0 0 0 0 Africa
Antigua and Barbuda 17.06080 -61.79640 2020-01-22 0 0 0 0 Americas
Argentina -38.41610 -63.61670 2020-01-22 0 0 0 0 Americas
Armenia 40.06910 45.03820 2020-01-22 0 0 0 0 Europe
Australian Capital Territory Australia -35.47350 149.01240 2020-01-22 0 0 0 0 Western Pacific
New South Wales Australia -33.86880 151.20930 2020-01-22 0 0 0 0 Western Pacific

Estructura de los datos y cambio nombre de las columnas

str(datos)
## 'data.frame':    49068 obs. of  10 variables:
##  $ Province.State: Factor w/ 79 levels "","Alberta","Anguilla",..: 1 1 1 1 1 1 1 1 6 47 ...
##  $ Country.Region: Factor w/ 187 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 9 ...
##  $ Lat           : num  33.9 41.2 28 42.5 -11.2 ...
##  $ Long          : num  67.71 20.17 1.66 1.52 17.87 ...
##  $ Date          : Factor w/ 188 levels "2020-01-22","2020-01-23",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Confirmed     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Deaths        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Recovered     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Active        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ WHO.Region    : Factor w/ 6 levels "Africa","Americas",..: 3 4 1 4 1 2 2 4 6 6 ...
colnames(datos) = c("Provincia_Estado",
                    "Pais_Region",
                    "Latitud", # N+ o S-
                    "Longitud", # E+ o W-
                    "Fecha",
                    "Casos_Confirmados",
                    "Casos_Muertos",
                    "Casos_Recuperados",
                    "Casos_Activos",
                    "WHO_Region"
                    )
datos %>% head() %>% kable() # %>% kable_styling()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Activos WHO_Region
Afghanistan 33.93911 67.70995 2020-01-22 0 0 0 0 Eastern Mediterranean
Albania 41.15330 20.16830 2020-01-22 0 0 0 0 Europe
Algeria 28.03390 1.65960 2020-01-22 0 0 0 0 Africa
Andorra 42.50630 1.52180 2020-01-22 0 0 0 0 Europe
Angola -11.20270 17.87390 2020-01-22 0 0 0 0 Africa
Antigua and Barbuda 17.06080 -61.79640 2020-01-22 0 0 0 0 Americas

Tipo de datos de cada columna

  • Cualitativas se convierten factor o bien as.factor.
  • Ordinales se convierten con ordered.
  • Cuantitativas se convierten con as.numeric.

El tipo de dato fecha y su manipulación

Cambiar la columna fecha a tipo Date:

#datos$Fecha %<>% as.Date(format="%Y-%m-%d")
datos$Fecha %<>% ymd() # Con librería lubridate
str(datos)
## 'data.frame':    49068 obs. of  10 variables:
##  $ Provincia_Estado : Factor w/ 79 levels "","Alberta","Anguilla",..: 1 1 1 1 1 1 1 1 6 47 ...
##  $ Pais_Region      : Factor w/ 187 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 9 ...
##  $ Latitud          : num  33.9 41.2 28 42.5 -11.2 ...
##  $ Longitud         : num  67.71 20.17 1.66 1.52 17.87 ...
##  $ Fecha            : Date, format: "2020-01-22" "2020-01-22" ...
##  $ Casos_Confirmados: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Casos_Muertos    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Casos_Recuperados: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Casos_Activos    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ WHO_Region       : Factor w/ 6 levels "Africa","Americas",..: 3 4 1 4 1 2 2 4 6 6 ...

\[Casos\ Confirmados = Muertos + Recuperados + Enfermos\]

# Lo siguiente da lo mismo que la columna Casos_Activos, pero en el dataset que se
# utilizó en el curso no aparecía
datos %<>% # Ventaja que nos ofrece la librería magrittr
  mutate(Casos_Enfermos = Casos_Confirmados - Casos_Muertos - Casos_Recuperados)

datos %>%
  filter(Casos_Confirmados > 10000) %>%
  head() %>%
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Activos WHO_Region Casos_Enfermos
Hubei China 30.9756 112.2707 2020-02-02 11177 350 295 10532 Western Pacific 10532
Hubei China 30.9756 112.2707 2020-02-03 13522 414 386 12722 Western Pacific 12722
Hubei China 30.9756 112.2707 2020-02-04 16678 479 522 15677 Western Pacific 15677
Hubei China 30.9756 112.2707 2020-02-05 19665 549 633 18483 Western Pacific 18483
Hubei China 30.9756 112.2707 2020-02-06 22112 618 817 20677 Western Pacific 20677
Hubei China 30.9756 112.2707 2020-02-07 24953 699 1115 23139 Western Pacific 23139
datos %>%
  filter(Casos_Enfermos < 0) %>%
  arrange(Provincia_Estado, Fecha) %>%
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Activos WHO_Region Casos_Enfermos
Liechtenstein 47.140000 9.55000 2020-06-23 82 2 81 -1 Europe -1
Uganda 1.373333 32.29028 2020-07-20 1069 0 1071 -2 Africa -2
Channel Islands United Kingdom 49.372300 -2.36440 2020-05-23 558 45 515 -2 Europe -2
Channel Islands United Kingdom 49.372300 -2.36440 2020-05-24 558 45 517 -4 Europe -4
Channel Islands United Kingdom 49.372300 -2.36440 2020-05-25 559 45 517 -3 Europe -3
Channel Islands United Kingdom 49.372300 -2.36440 2020-05-30 560 45 525 -10 Europe -10
Channel Islands United Kingdom 49.372300 -2.36440 2020-05-31 560 45 528 -13 Europe -13
Channel Islands United Kingdom 49.372300 -2.36440 2020-06-01 560 45 528 -13 Europe -13
Channel Islands United Kingdom 49.372300 -2.36440 2020-06-02 560 46 528 -14 Europe -14
Hainan China 19.195900 109.74530 2020-03-24 168 6 168 -6 Western Pacific -6
Hainan China 19.195900 109.74530 2020-03-25 168 6 168 -6 Western Pacific -6
Hainan China 19.195900 109.74530 2020-03-26 168 6 168 -6 Western Pacific -6
Hainan China 19.195900 109.74530 2020-03-27 168 6 168 -6 Western Pacific -6
Hainan China 19.195900 109.74530 2020-03-28 168 6 168 -6 Western Pacific -6
Hainan China 19.195900 109.74530 2020-03-29 168 6 168 -6 Western Pacific -6
Hainan China 19.195900 109.74530 2020-03-30 168 6 168 -6 Western Pacific -6
Hainan China 19.195900 109.74530 2020-03-31 168 6 168 -6 Western Pacific -6
Hainan China 19.195900 109.74530 2020-04-01 168 6 168 -6 Western Pacific -6
datos %>%
  filter(Provincia_Estado == "Hainan") %>%
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Activos WHO_Region Casos_Enfermos
Hainan China 19.1959 109.7453 2020-01-22 4 0 0 4 Western Pacific 4
Hainan China 19.1959 109.7453 2020-01-23 5 0 0 5 Western Pacific 5
Hainan China 19.1959 109.7453 2020-01-24 8 0 0 8 Western Pacific 8
Hainan China 19.1959 109.7453 2020-01-25 19 0 0 19 Western Pacific 19
Hainan China 19.1959 109.7453 2020-01-26 22 0 0 22 Western Pacific 22
Hainan China 19.1959 109.7453 2020-01-27 33 1 0 32 Western Pacific 32
Hainan China 19.1959 109.7453 2020-01-28 40 1 0 39 Western Pacific 39
Hainan China 19.1959 109.7453 2020-01-29 43 1 0 42 Western Pacific 42
Hainan China 19.1959 109.7453 2020-01-30 46 1 1 44 Western Pacific 44
Hainan China 19.1959 109.7453 2020-01-31 52 1 1 50 Western Pacific 50
Hainan China 19.1959 109.7453 2020-02-01 62 1 1 60 Western Pacific 60
Hainan China 19.1959 109.7453 2020-02-02 64 1 4 59 Western Pacific 59
Hainan China 19.1959 109.7453 2020-02-03 72 1 4 67 Western Pacific 67
Hainan China 19.1959 109.7453 2020-02-04 80 1 5 74 Western Pacific 74
Hainan China 19.1959 109.7453 2020-02-05 99 1 5 93 Western Pacific 93
Hainan China 19.1959 109.7453 2020-02-06 106 1 8 97 Western Pacific 97
Hainan China 19.1959 109.7453 2020-02-07 117 2 10 105 Western Pacific 105
Hainan China 19.1959 109.7453 2020-02-08 124 2 14 108 Western Pacific 108
Hainan China 19.1959 109.7453 2020-02-09 131 3 19 109 Western Pacific 109
Hainan China 19.1959 109.7453 2020-02-10 138 3 19 116 Western Pacific 116
Hainan China 19.1959 109.7453 2020-02-11 144 3 20 121 Western Pacific 121
Hainan China 19.1959 109.7453 2020-02-12 157 4 27 126 Western Pacific 126
Hainan China 19.1959 109.7453 2020-02-13 157 4 30 123 Western Pacific 123
Hainan China 19.1959 109.7453 2020-02-14 159 4 43 112 Western Pacific 112
Hainan China 19.1959 109.7453 2020-02-15 162 4 39 119 Western Pacific 119
Hainan China 19.1959 109.7453 2020-02-16 162 4 52 106 Western Pacific 106
Hainan China 19.1959 109.7453 2020-02-17 163 4 59 100 Western Pacific 100
Hainan China 19.1959 109.7453 2020-02-18 163 4 79 80 Western Pacific 80
Hainan China 19.1959 109.7453 2020-02-19 168 4 84 80 Western Pacific 80
Hainan China 19.1959 109.7453 2020-02-20 168 4 86 78 Western Pacific 78
Hainan China 19.1959 109.7453 2020-02-21 168 4 95 69 Western Pacific 69
Hainan China 19.1959 109.7453 2020-02-22 168 4 104 60 Western Pacific 60
Hainan China 19.1959 109.7453 2020-02-23 168 5 106 57 Western Pacific 57
Hainan China 19.1959 109.7453 2020-02-24 168 5 116 47 Western Pacific 47
Hainan China 19.1959 109.7453 2020-02-25 168 5 124 39 Western Pacific 39
Hainan China 19.1959 109.7453 2020-02-26 168 5 129 34 Western Pacific 34
Hainan China 19.1959 109.7453 2020-02-27 168 5 131 32 Western Pacific 32
Hainan China 19.1959 109.7453 2020-02-28 168 5 133 30 Western Pacific 30
Hainan China 19.1959 109.7453 2020-02-29 168 5 148 15 Western Pacific 15
Hainan China 19.1959 109.7453 2020-03-01 168 5 149 14 Western Pacific 14
Hainan China 19.1959 109.7453 2020-03-02 168 5 151 12 Western Pacific 12
Hainan China 19.1959 109.7453 2020-03-03 168 5 155 8 Western Pacific 8
Hainan China 19.1959 109.7453 2020-03-04 168 5 158 5 Western Pacific 5
Hainan China 19.1959 109.7453 2020-03-05 168 6 158 4 Western Pacific 4
Hainan China 19.1959 109.7453 2020-03-06 168 6 158 4 Western Pacific 4
Hainan China 19.1959 109.7453 2020-03-07 168 6 158 4 Western Pacific 4
Hainan China 19.1959 109.7453 2020-03-08 168 6 159 3 Western Pacific 3
Hainan China 19.1959 109.7453 2020-03-09 168 6 159 3 Western Pacific 3
Hainan China 19.1959 109.7453 2020-03-10 168 6 159 3 Western Pacific 3
Hainan China 19.1959 109.7453 2020-03-11 168 6 159 3 Western Pacific 3
Hainan China 19.1959 109.7453 2020-03-12 168 6 160 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-03-13 168 6 160 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-03-14 168 6 160 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-03-15 168 6 160 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-03-16 168 6 161 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-03-17 168 6 161 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-03-18 168 6 161 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-03-19 168 6 161 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-03-20 168 6 161 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-03-21 168 6 161 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-03-22 168 6 161 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-03-23 168 6 161 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-03-24 168 6 168 -6 Western Pacific -6
Hainan China 19.1959 109.7453 2020-03-25 168 6 168 -6 Western Pacific -6
Hainan China 19.1959 109.7453 2020-03-26 168 6 168 -6 Western Pacific -6
Hainan China 19.1959 109.7453 2020-03-27 168 6 168 -6 Western Pacific -6
Hainan China 19.1959 109.7453 2020-03-28 168 6 168 -6 Western Pacific -6
Hainan China 19.1959 109.7453 2020-03-29 168 6 168 -6 Western Pacific -6
Hainan China 19.1959 109.7453 2020-03-30 168 6 168 -6 Western Pacific -6
Hainan China 19.1959 109.7453 2020-03-31 168 6 168 -6 Western Pacific -6
Hainan China 19.1959 109.7453 2020-04-01 168 6 168 -6 Western Pacific -6
Hainan China 19.1959 109.7453 2020-04-02 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-03 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-04 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-05 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-06 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-07 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-08 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-09 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-10 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-11 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-12 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-13 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-14 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-15 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-16 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-17 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-18 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-19 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-20 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-21 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-22 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-23 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-24 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-25 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-26 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-27 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-28 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-29 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-30 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-01 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-02 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-03 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-04 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-05 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-06 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-07 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-08 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-09 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-10 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-11 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-12 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-13 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-14 168 6 162 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-15 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-16 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-17 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-18 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-19 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-20 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-21 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-22 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-23 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-24 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-25 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-26 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-27 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-28 169 6 162 1 Western Pacific 1
Hainan China 19.1959 109.7453 2020-05-29 169 6 163 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-30 169 6 163 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-05-31 169 6 163 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-06-01 169 6 163 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-06-02 169 6 163 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-06-03 169 6 163 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-06-04 169 6 163 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-06-05 169 6 163 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-06-06 170 6 162 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-07 170 6 162 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-08 170 6 162 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-09 170 6 162 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-10 170 6 162 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-11 170 6 162 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-12 171 6 162 3 Western Pacific 3
Hainan China 19.1959 109.7453 2020-06-13 171 6 162 3 Western Pacific 3
Hainan China 19.1959 109.7453 2020-06-14 171 6 162 3 Western Pacific 3
Hainan China 19.1959 109.7453 2020-06-15 171 6 163 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-16 171 6 163 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-17 171 6 163 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-18 171 6 163 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-19 171 6 163 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-20 171 6 163 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-21 171 6 163 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-22 171 6 163 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-23 171 6 163 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-24 171 6 163 2 Western Pacific 2
Hainan China 19.1959 109.7453 2020-06-25 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-06-26 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-06-27 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-06-28 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-06-29 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-06-30 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-01 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-02 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-03 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-04 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-05 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-06 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-07 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-08 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-09 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-10 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-11 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-12 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-13 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-14 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-15 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-16 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-17 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-18 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-19 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-20 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-21 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-22 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-23 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-24 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-25 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-26 171 6 165 0 Western Pacific 0
Hainan China 19.1959 109.7453 2020-07-27 171 6 165 0 Western Pacific 0

Datos anómalos y sin sentido

# Corregir datos anómalos y sin sentido
datos %>%
  filter(Provincia_Estado == "Hainan", Casos_Enfermos < 0) %>%
  mutate(Casos_Recuperados = Casos_Recuperados + Casos_Enfermos,
         Casos_Enfermos = 0) %>%
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Activos WHO_Region Casos_Enfermos
Hainan China 19.1959 109.7453 2020-03-24 168 6 162 -6 Western Pacific 0
Hainan China 19.1959 109.7453 2020-03-25 168 6 162 -6 Western Pacific 0
Hainan China 19.1959 109.7453 2020-03-26 168 6 162 -6 Western Pacific 0
Hainan China 19.1959 109.7453 2020-03-27 168 6 162 -6 Western Pacific 0
Hainan China 19.1959 109.7453 2020-03-28 168 6 162 -6 Western Pacific 0
Hainan China 19.1959 109.7453 2020-03-29 168 6 162 -6 Western Pacific 0
Hainan China 19.1959 109.7453 2020-03-30 168 6 162 -6 Western Pacific 0
Hainan China 19.1959 109.7453 2020-03-31 168 6 162 -6 Western Pacific 0
Hainan China 19.1959 109.7453 2020-04-01 168 6 162 -6 Western Pacific 0

Análisis geográfico de los datos

#datos_europa = datos[datos$Latitud > 38 & datos$Longitud > -25 & datos$Longitud < 30 , ]
datos_europa = datos %>%
  filter(Latitud > 38, between(Longitud, -25, 30))

nrow(datos_europa)
## [1] 8460
table(datos_europa$Pais_Region) %>%
  as.data.frame() %>%
  filter(Freq > 0) %>%
  kable()
Var1 Freq
Albania 188
Andorra 188
Austria 188
Belarus 188
Belgium 188
Bosnia and Herzegovina 188
Bulgaria 188
Croatia 188
Czechia 188
Denmark 376
Estonia 188
Finland 188
France 188
Germany 188
Greece 188
Holy See 188
Hungary 188
Iceland 188
Ireland 188
Italy 188
Kosovo 188
Latvia 188
Liechtenstein 188
Lithuania 188
Luxembourg 188
Moldova 188
Monaco 188
Montenegro 188
Netherlands 188
North Macedonia 188
Norway 188
Poland 188
Portugal 188
Romania 188
San Marino 188
Serbia 188
Slovakia 188
Slovenia 188
Spain 188
Sweden 188
Switzerland 188
United Kingdom 564
datos_europa %>%
  filter(Fecha == ymd("2020-03-15")) %>%
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Activos WHO_Region Casos_Enfermos
Albania 41.15330 20.168300 2020-03-15 42 1 0 41 Europe 41
Andorra 42.50630 1.521800 2020-03-15 1 0 1 0 Europe 0
Austria 47.51620 14.550100 2020-03-15 860 1 6 853 Europe 853
Belarus 53.70980 27.953400 2020-03-15 27 0 3 24 Europe 24
Belgium 50.83330 4.469936 2020-03-15 886 4 1 881 Europe 881
Bosnia and Herzegovina 43.91590 17.679100 2020-03-15 24 0 0 24 Europe 24
Bulgaria 42.73390 25.485800 2020-03-15 51 2 0 49 Europe 49
Croatia 45.10000 15.200000 2020-03-15 49 0 1 48 Europe 48
Czechia 49.81750 15.473000 2020-03-15 253 0 0 253 Europe 253
Faroe Islands Denmark 61.89260 -6.911800 2020-03-15 11 0 0 11 Europe 11
Denmark 56.26390 9.501800 2020-03-15 864 2 1 861 Europe 861
Estonia 58.59530 25.013600 2020-03-15 171 0 1 170 Europe 170
Finland 61.92411 25.748151 2020-03-15 244 0 10 234 Europe 234
France 46.22760 2.213700 2020-03-15 4499 91 12 4396 Europe 4396
Germany 51.16569 10.451526 2020-03-15 5795 11 46 5738 Europe 5738
Greece 39.07420 21.824300 2020-03-15 331 4 8 319 Europe 319
Holy See 41.90290 12.453400 2020-03-15 1 0 0 1 Europe 1
Hungary 47.16250 19.503300 2020-03-15 32 1 1 30 Europe 30
Iceland 64.96310 -19.020800 2020-03-15 171 5 8 158 Europe 158
Ireland 53.14240 -7.692100 2020-03-15 129 2 0 127 Europe 127
Italy 41.87194 12.567380 2020-03-15 24747 1809 2335 20603 Europe 20603
Latvia 56.87960 24.603200 2020-03-15 30 0 1 29 Europe 29
Liechtenstein 47.14000 9.550000 2020-03-15 4 0 1 3 Europe 3
Lithuania 55.16940 23.881300 2020-03-15 12 0 1 11 Europe 11
Luxembourg 49.81530 6.129600 2020-03-15 59 1 0 58 Europe 58
Moldova 47.41160 28.369900 2020-03-15 23 0 0 23 Europe 23
Monaco 43.73330 7.416700 2020-03-15 2 0 0 2 Europe 2
Montenegro 42.70868 19.374390 2020-03-15 0 0 0 0 Europe 0
Netherlands 52.13260 5.291300 2020-03-15 1135 20 0 1115 Europe 1115
North Macedonia 41.60860 21.745300 2020-03-15 14 0 1 13 Europe 13
Norway 60.47200 8.468900 2020-03-15 1221 3 1 1217 Europe 1217
Poland 51.91940 19.145100 2020-03-15 119 3 0 116 Europe 116
Portugal 39.39990 -8.224500 2020-03-15 245 0 2 243 Europe 243
Romania 45.94320 24.966800 2020-03-15 131 0 9 122 Europe 122
San Marino 43.94240 12.457800 2020-03-15 101 5 4 92 Europe 92
Serbia 44.01650 21.005900 2020-03-15 48 0 0 48 Europe 48
Slovakia 48.66900 19.699000 2020-03-15 54 0 0 54 Europe 54
Slovenia 46.15120 14.995500 2020-03-15 219 1 0 218 Europe 218
Spain 40.46367 -3.749220 2020-03-15 7798 289 517 6992 Europe 6992
Sweden 60.12816 18.643501 2020-03-15 1022 3 0 1019 Europe 1019
Switzerland 46.81820 8.227500 2020-03-15 2200 14 4 2182 Europe 2182
Channel Islands United Kingdom 49.37230 -2.364400 2020-03-15 3 0 0 3 Europe 3
Isle of Man United Kingdom 54.23610 -4.548100 2020-03-15 0 0 0 0 Europe 0
United Kingdom 55.37810 -3.436000 2020-03-15 3072 43 18 3011 Europe 3011
Kosovo 42.60264 20.902977 2020-03-15 0 0 0 0 Europe 0

Ejercicio práctico: mi viaje a Potsman

Distancia euclídea:

\[d(x, y) = \sqrt{(x_{Lat} - y_{Lat})^2 + (x_{Long} - y_{Long})^2}\]

distancia_grados <- function(x, y){
  sqrt((x[1] - y[1])^2 + (x[2] - y[2])^2)
}

distancia_grados_potsdam <- function(x){
  potsdam = c(52.366956, 13.906734)
  distancia_grados(x, potsdam)
}

dist_potsdam <- apply(cbind(datos_europa$Latitud, datos_europa$Longitud),
                      MARGIN = 1,
                      FUN = distancia_grados_potsdam)

datos_europa %<>%
  mutate(dist_potsdam = dist_potsdam)

datos_europa %>%
  filter(between(Fecha, dmy("02-03-2020"), dmy("07-03-2020")),
         dist_potsdam < 4) %>% # Radio menor de 4 grados
  arrange(Pais_Region) %>% # Ordenar por país
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Activos WHO_Region Casos_Enfermos dist_potsdam
Czechia 49.81750 15.47300 2020-03-02 3 0 0 3 Europe 3 2.992142
Czechia 49.81750 15.47300 2020-03-03 5 0 0 5 Europe 5 2.992142
Czechia 49.81750 15.47300 2020-03-04 8 0 0 8 Europe 8 2.992142
Czechia 49.81750 15.47300 2020-03-05 12 0 0 12 Europe 12 2.992142
Czechia 49.81750 15.47300 2020-03-06 18 0 0 18 Europe 18 2.992142
Czechia 49.81750 15.47300 2020-03-07 19 0 0 19 Europe 19 2.992142
Germany 51.16569 10.45153 2020-03-02 159 0 16 143 Europe 143 3.658073
Germany 51.16569 10.45153 2020-03-03 196 0 16 180 Europe 180 3.658073
Germany 51.16569 10.45153 2020-03-04 262 0 16 246 Europe 246 3.658073
Germany 51.16569 10.45153 2020-03-05 482 0 16 466 Europe 466 3.658073
Germany 51.16569 10.45153 2020-03-06 670 0 17 653 Europe 653 3.658073
Germany 51.16569 10.45153 2020-03-07 799 0 18 781 Europe 781 3.658073

Mapas del mundo con rnaturalearth

# Antes se necesita instalar rnaturalearthdata
#install.packages("rnaturalearthdata")
world <- ne_countries(scale = "medium", returnclass = "sf")

datos$Pais_Region = factor(datos$Pais_Region, levels = c(levels(datos$Pais_Region), "United States"))
datos[datos$Pais_Region == "US", ]$Pais_Region = "United States"

world %>% 
  inner_join(datos, by = c("name" = "Pais_Region")) %>%
  filter(Fecha == dmy("30-05-2020")) %>%
  ggplot() +
  geom_sf(color = "black", aes(fill = Casos_Confirmados)) +
  #coord_sf(crs = "+proj=laea +lat_0=50 +lon_0=10 +units=m +ellps=GRS80") +
  scale_fill_viridis_c(option = "plasma", trans = "sqrt") +
  xlab("Longitud") + ylab("Latitud") +
  ggtitle("Mapa del mundo",  subtitle = "COVID-19") -> g

ggplotly(g) # para hacer el mapa interativo

Faltan regiones por pintar porque no vienen estandarizados los países en los dos datasets. Se ha arreglado Estados Unidos, sería recomendable hacer lo mismo con los demás países faltantes.

Mapa minimalista con puntos

datos %>%
  filter(Fecha == dmy("30/03/2020")) %>%
  ggplot(aes(x = Longitud, y = Latitud)) +
  geom_point(aes(size = log(Casos_Confirmados + 1),
                 colour = log(Casos_Muertos + 1))) +
  coord_fixed() +
  theme(legend.position = "bottom") -> g

ggplotly(g) # para hacer el mapa interativo

Top de países infectados

thresh = 1000
datos %>%
  filter(Fecha == ymd("2020-04-05"),
         Casos_Confirmados > thresh) %>%
  mutate(Prop_Muertos = Casos_Muertos / Casos_Confirmados,
         Ranking = dense_rank(desc(Prop_Muertos))) %>%
  arrange(Ranking) %>%
  head(10) %>%
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Activos WHO_Region Casos_Enfermos Prop_Muertos Ranking
Italy 41.87194 12.567380 2020-04-05 128948 15887 21815 91246 Europe 91246 0.1232047 1
Algeria 28.03390 1.659600 2020-04-05 1320 152 90 1078 Africa 1078 0.1151515 2
France 46.22760 2.213700 2020-04-05 70478 8078 16183 46217 Europe 46217 0.1146173 3
Netherlands 52.13260 5.291300 2020-04-05 17851 1766 0 16085 Europe 16085 0.0989300 4
United Kingdom 55.37810 -3.436000 2020-04-05 60792 5866 135 54791 Europe 54791 0.0964930 5
Spain 40.46367 -3.749220 2020-04-05 131646 12641 38080 80925 Europe 80925 0.0960227 6
Indonesia -0.78930 113.921300 2020-04-05 2273 198 164 1911 South-East Asia 1911 0.0871095 7
Belgium 50.83330 4.469936 2020-04-05 19691 1447 3751 14493 Europe 14493 0.0734853 8
Morocco 31.79170 -7.092600 2020-04-05 1021 70 76 875 Eastern Mediterranean 875 0.0685602 9
Egypt 26.82055 30.802498 2020-04-05 1173 78 247 848 Eastern Mediterranean 848 0.0664962 10

Segmentación de los datos de Regiones por Categorías con mosaicplot

datos$lat_class = cut(datos$Latitud,
                      breaks = seq(from = -90, to = 90, by = 10))
datos$long_class = cut(datos$Longitud,
                       breaks = seq(from = -180, to = 180, by = 10))
tt = table(datos$lat_class, datos$long_class)
tt = tt[nrow(tt):1, ]
mosaicplot(t(tt), shade = T)

Análisis Temporal de los Datos

Evolución de Infectados vs Recuperados vs Muertos

datos_por_fecha = aggregate(
  cbind(Casos_Confirmados, Casos_Muertos, Casos_Recuperados) ~ Fecha,
                            data = datos,
                            FUN = sum)

datos_por_fecha$Casos_Enfermos = datos_por_fecha$Casos_Confirmados - datos_por_fecha$Casos_Muertos - datos_por_fecha$Casos_Recuperados
head(datos_por_fecha)
##        Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Enfermos
## 1 2020-01-22               555            17                28            510
## 2 2020-01-23               654            18                30            606
## 3 2020-01-24               941            26                36            879
## 4 2020-01-25              1434            42                39           1353
## 5 2020-01-26              2118            56                52           2010
## 6 2020-01-27              2927            82                61           2784
tail(datos_por_fecha)
##          Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Enfermos
## 183 2020-07-22          15227725        623540           8541255        6062930
## 184 2020-07-23          15510481        633506           8710969        6166006
## 185 2020-07-24          15791645        639650           8939705        6212290
## 186 2020-07-25          16047190        644517           9158743        6243930
## 187 2020-07-26          16251796        648621           9293464        6309711
## 188 2020-07-27          16480485        654036           9468087        6358362
barplot(Casos_Confirmados ~ Fecha, data = datos_por_fecha)

plot(Casos_Confirmados ~ Fecha, data = datos_por_fecha,
     col = "blue", type = "l", main = "Casos documentados por día en todo el mundo",
     xlab = "Fecha", ylab = "Número de personas", log = "y")
lines(Casos_Muertos ~ Fecha, data = datos_por_fecha, col = "red")
lines(Casos_Recuperados ~ Fecha, data = datos_por_fecha, col = "green")
legend("topleft", c("Confirmados", "Muertos", "Recuperados"),
       col = c("blue", "red", "green"), pch = 1, lwd = 2)

Análisis de los datos de España

datos_spain = datos %>%
  filter(Pais_Region == "Spain") %>%
  select(Fecha, starts_with("Casos_"))

plot(x = datos_spain$Fecha, y = datos_spain$Casos_Confirmados,
     main = "Datos confirmados en España", type = "s",
     col = "blue", lwd = 2)

barplot(as.matrix(t(datos_spain[, 3:5])),
        names = datos_spain$Fecha,
        col = c("red", "green", "yellow"),
        main = "Estudio de casos por tipo en España",
        xlab = "Fecha",
        ylab = "Número de personas")
legend("topleft", c("Muertos", "Recuperados", "Enfermos"),
       col = c("red", "green", "yellow"), lwd = 2, pch = 1)

Librería xts en el Análisis Temporal

datos_por_fecha_ts <- xts(x = datos_por_fecha[, 2:5],
                          order.by = datos_por_fecha$Fecha)
dygraph(datos_por_fecha_ts) %>%
  dyOptions(labelsUTC = T, labelsKMB = T,
            fillGraph = T, fillAlpha = 0.05,
            drawGrid = F, colors = "#FF2D55") %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2,
              hideOnMouseOut = F) %>%
  dyRoller(rollPeriod = 2)
datos_spain = datos %>%
  filter(Pais_Region == "Spain") %>%
  select(Fecha, starts_with("Casos_"))

plot(x = datos_spain$Fecha, y = datos_spain$Casos_Confirmados,
     main = "Datos confirmados en España", type = "s",
     col = "blue", lwd = 2)

datos_por_fecha_ts <- xts(x = datos_spain[, 2:5],
                          order.by = datos_spain$Fecha)
dygraph(datos_por_fecha_ts) %>%
  dyOptions(labelsUTC = T, labelsKMB = T,
            fillGraph = T, drawGrid = F, colors = "#FF2D55") %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2,
              hideOnMouseOut = F) %>%
  dyRoller(rollPeriod = 2)

Lead, lag y estudio de nuevos casos

datos_spain %<>%
  mutate(Nuevos_Casos_Confirmados = Casos_Confirmados - lag(Casos_Confirmados, n = 1),
         Nuevos_Casos_Muertos = Casos_Muertos - lag(Casos_Muertos, n = 1),
         Nuevos_Casos_Recuperados = Casos_Recuperados - lag(Casos_Recuperados, n = 1)
         )

plot(Nuevos_Casos_Confirmados ~ Fecha, data = datos_spain[datos_spain$Fecha <= "2020-04-12", ],
     type = "l", col ="blue", 
     xlab = "Fecha", ylab = "Nuevos casos", 
     main = "Nuevos registros en España")
lines(Nuevos_Casos_Muertos ~ Fecha, data = datos_spain[datos_spain$Fecha <= "2020-04-12", ],
      type = "l", col = "red")
lines(Nuevos_Casos_Recuperados ~ Fecha, data = datos_spain[datos_spain$Fecha <= "2020-04-12", ],
      type = "l", col = "green")

legend("topleft", c("Confirmados", "Muertos", "Recuperados"), 
       col = c("blue", "red", "green"), 
       lwd = 2, pch = 1)

Se tomaron mal los datos porque aparecen Nuevos Confirmados y Nuevos muertos negativos por lo que muestro la evolución de nuevos confirmados, nuevos muertos y nuevos recuperados hasta el 12 de abril de 2020 (igual que en el curso).

Análisis por Cohortes

Días desde el Origen de la pandemia

primer_contagio = datos %>%
  group_by(Pais_Region) %>%
  filter(Casos_Confirmados > 0) %>%
  summarise(Primer_Contagio = min(Fecha - 1)) # Fecha de primer contacto de cada país

data_first =  datos %>%
  inner_join(primer_contagio, by = "Pais_Region") %>%
  mutate(Dias_Desde_PC = as.numeric(Fecha - Primer_Contagio)) %>%
  filter(Dias_Desde_PC >= 0) %>%
  group_by(Dias_Desde_PC, Pais_Region) %>%
  summarise(Casos_Confirmados = sum(Casos_Confirmados),
            Casos_Muertos = sum(Casos_Muertos),
            Casos_Recuperados = sum(Casos_Recuperados),
            Casos_Enfermos = sum(Casos_Enfermos))
## `summarise()` has grouped output by 'Dias_Desde_PC'. You can override using the
## `.groups` argument.

Representación Gráfica del Cohorte

data_first %>%
  #filter(Pais_Region %in% c("Spain", "Italy", "China", "US", "Germany")) %>%
  ggplot(aes(x = Dias_Desde_PC, y = Casos_Confirmados)) +
  geom_line(aes(col = Pais_Region)) +
  xlab("Días desde el primer contagio") +
  ylab("Número de personas contagiadas") +
  ggtitle("Análisis por Cohortes") +
  theme(legend.position = "none") -> g

ggplotly(g)